Discipline Assignment 1

Author

520514492

Published

March 26, 2025

1 Executive summary

Part 1 of this report aims to answer the research question: how does the distribution of coral bleaching vary across latitude zones during the 2014–2017 global bleaching event? To achieve this, an interactive scatter map visualization is created, which finds that across all years of the event, the highest percentage of bleaching occurred in the near-equatorial and equatorial zones. In particular years, this fact remains true for 2014 and 2016. In 2015, however, bleaching was highest in tropical midlatitudes, and in 2017, bleaching peaked in tropical lowlatitudes.

Part 2 of this report aims to answer the research question: how do the accuracy of bleaching predictions from current vs four year old variables compare? To achieve this, multiple machine learning (ML) models were trained and deployed. The findings were that models built on the contemporary dataset, had a slightly higher average prediction accuracy (45%) than the four years prior dataset (43%).

This report was formatted in Quatro (Quatro 2025). R (The R Foundation 2025) is used most of the report’s basic numerical analysis. The following packages were also used: the Tidyverse collection (Tidyverse 2025), for visualization; Patchwork (Thomas Lin Pedersen 2025), to format graphs; Plotly (plotly 2025), Maps (Richard A. Becker 2025), Crosstalk (Joe Cheng 2025a) and HTMLtools (Joe Cheng 2025b), to create interactive graphs; Simple Features (sf) (Edzer Pebesma 2025), to transform NCDF data; lmtest (Torsten Hothorn 2025) to perform the Durbin-Watson test; and class (Brian Ripley 2025), e1071 (David Meyer 2025), randomForest (Leo Breiman 2025) and cvTools (Andreas Alfons 2025), to train and deploy ML models.

2 Part 1

2.1 Introduction

This section investigates a statement made by the authors of a study (S. Sully 2019) that examines coral bleaching events from 1998 to 2017 across in 81 countries. They claim that “the highest probability of coral bleaching occurred at tropical midlatitude sites (15–20 degrees north and south of the Equator)”.

2.2 Formulating problem

From this claim, we derive the following research question and hypothesis:

Research question: How does the distribution of coral bleaching vary across latitude zones during the 2014–2017 global bleaching event?

Hypothesis: The tropical midlatitude zone (15–20°) will show the highest average bleaching levels compared to other latitude zones between 2014 and 2017.

The 2014-2017 period is of special interest, as it represents on of the most intense coral bleaching events on record.

2.3 Exploratory Data Analysis

2.3.1 Data cleaning

The various environmental measurements recorded throughout the study - such as temperature, and average bleaching - are contained in an associated dataset. The data stores location as latitude and longitude, which can be transformed into the following five equatorial zones (Meteoblue n.d): Equatorial (0-5°), Near-Equatorial (5-10°), Tropical Lowlatitude (10-15°), Tropical Midlatitude (15-20°), Tropical Highlatitude (20-25°), and Subtropical (25-35°).

Missing values, denoted by ‘NA’, are also removed, to ensure subsequent operations occur on a complete and consistent dataset. These values could represent sensor failure or missed data.

Code
p1 <- read.csv("data/Reef_Check_with_cortad_variables_with_annual_rate_of_SST_change.csv")

p1data <- p1 %>%
  drop_na() %>%
  filter(Year >= 2014 & Year <= 2017) %>%
  arrange(Average_bleaching) %>%
  mutate(
    Latitude_Zone = case_when(
      abs(Latitude.Degrees) >= 25 & abs(Latitude.Degrees) < 35 ~ "Subtropical (25-35°)",
      abs(Latitude.Degrees) >= 20 & abs(Latitude.Degrees) < 25 ~ "Tropical Highlatitude (20-25°)",
      abs(Latitude.Degrees) >= 15 & abs(Latitude.Degrees) < 20 ~ "Tropical Midlatitude (15-20°)",
      abs(Latitude.Degrees) >= 10 & abs(Latitude.Degrees) < 15 ~ "Tropical Lowlatitude (10-15°)",
      abs(Latitude.Degrees) >= 5  & abs(Latitude.Degrees) < 10 ~ "Near-Equatorial (5-10°)",
      abs(Latitude.Degrees) >= 0  & abs(Latitude.Degrees) < 5  ~ "Equatorial (0-5°)"    
    )
  )

p1data$Latitude_Zone <- factor(p1data$Latitude_Zone, levels = c(
  "Equatorial (0-5°)",
  "Near-Equatorial (5-10°)",
  "Tropical Lowlatitude (10-15°)",
  "Tropical Midlatitude (15-20°)",
  "Tropical Highlatitude (20-25°)",
  "Subtropical (25-35°)"
))

p1data %>%
  select(Reef.Name, Country, Year, Average_bleaching, Temperature_Mean, Latitude.Degrees, Latitude_Zone) %>%
  head(6) %>%
  knitr::kable(caption = "Sample of Bleaching Dataset")
Sample of Bleaching Dataset
Reef.Name Country Year Average_bleaching Temperature_Mean Latitude.Degrees Latitude_Zone
Fronton Dominican Republic 2014 0 300.30 19.29722 Tropical Midlatitude (15-20°)
Fronton Dominican Republic 2014 0 300.30 19.29722 Tropical Midlatitude (15-20°)
Mae Haad Reef Thailand 2014 0 302.00 10.08500 Tropical Lowlatitude (10-15°)
Aow Leuk Thailand 2014 0 302.01 10.07186 Tropical Lowlatitude (10-15°)
Aow Leuk Thailand 2014 0 302.01 10.07186 Tropical Lowlatitude (10-15°)
Mango Bay Thailand 2014 0 302.06 10.12278 Tropical Lowlatitude (10-15°)
Code
bleachingData = p1data

2.3.2 Data distribution

Before assessing the paper’s claim about bleaching, the distribution of bleaching in the dataset must be investigated. Below is a boxplot visualizing the distribution of Average bleaching in each of the latitude zones that were defined in the earlier section. Average_bleaching is a measure of the bleaching percentage of a particular reef. The y axis, ‘Log Average Bleaching’, is a log transformation of Average_bleaching. This reduces variable’s range from 100 to 2, neatening the plot. It is evident from the graph, that the average bleaching is relatively evenly distributed across the zones.

Code
ggplot(p1data, aes(x = Latitude_Zone, y = Average_bleaching)) +
  geom_boxplot(outlier.shape = NA, fill = "white", color = "black") +
  scale_y_log10() +
  
  labs(title = "Distribution of Average Bleaching by Latitude Zone",
    x = "Latitude Zone", y = "Log Average Bleaching (%)") +
  theme_minimal() +
  
  theme(axis.text.x = element_text(angle = 30, hjust = 1),
    plot.title = element_text(face = "bold")
  )

2.4 Executing solution

2.4.1 Initial visualization

The following is a barplot of the average bleaching in each Latitude Zone from 2014 to 2017. The general trend is that bleaching percentage generally decreased from 2014-17, with the Near-Equatorial zone having the highest bleaching among all zones in 2016 and 2017. In addition, Tropical Lowlatitiude and Midlatitudes peaked in 2014 and 2015, respectively.

Code
ggplot(p1data, aes(x = Year, y = Average_bleaching, fill = Latitude_Zone)) +
  geom_col(position = "dodge") +
  scale_fill_brewer(palette = "Set2") +
  
  labs(x = "Year", y = "Average Bleaching (%)",
    fill = "Latitude Zone", title = "Average Bleaching by Latitude Zone (2014–2017)") +
  
  theme_minimal() + theme(
    plot.title = element_text(face = "bold")
  )

2.4.2 Interactive visualization

While a barplot is useful, in order to more intuitively detect patterns of bleaching among the latitude zones, we can use a scatter map visualization. Below is an interactive geoscatter visualization, adapted from example Plotly code (plotly Graphing Libraries 2025). We load the map data from plotly, which gives us the canvas to ‘draw’ bleaching points on. The points are formatted, given a color scale, and then filters for year, zone, and average bleaching are added to the top of the plot.

From the plot as default, it is evident that the highest percentage of bleaching occurred in the near-equatorial and equatorial zones. Adjusting the ‘year’ filter, it is clear that this fact remains true for years 2014 and 2016. In 2015, however, bleaching was highest in tropical midlatitudes, concentrated around the east coast of australia. In 2017, bleaching peaked in tropical lowlatitudes.

Code
shared_data <- SharedData$new(p1data)

# Plotly map 
map <- plot_ly(data = shared_data, type = "scattergeo", mode = "markers",
  lat = ~Latitude.Degrees, lon = ~Longitude.Degrees,
  
  marker = list(size = 6, opacity = 0.8, color = ~Average_bleaching,
    colorscale = "Magma", colorbar = list(title = "Average Bleaching (%)")),
  
  text = ~paste("Year:", Year, "Zone:", Latitude_Zone, "Bleaching:", 
                Average_bleaching, "%"), hoverinfo = "text") %>%
  
  plotly::layout(title = list(text = "Map of Coral Bleaching"),
    geo = list(showland = TRUE, landcolor = "#416f22", showcountries = TRUE,
      showocean = TRUE, oceancolor = "#1a2845", projection = list(type = "equirectangular")
    )
  )

#  filters
filters <- bscols(
  widths = c(2, 3, 4),
  filter_select("year_filter", "Year", shared_data, ~Year),
  filter_select("zone_filter", "Latitude Zone", shared_data, ~Latitude_Zone),
  filter_slider("bleach_filter", "Average Bleaching (%)", shared_data, ~Average_bleaching, width = "100%")
)

# Render
browsable(tagList(filters, map))

2.5 Evaluation and improvements

2.5.1 Comparing against hypothesis

Insights from the visualizations, do not fully support the hypothesis. The tropical midlatitude had the highest average bleaching only once out of the four years, in 2015, with equatorial zones having higher peaks in 2014 and 2016, and tropical lowlatitudes dominating in 2017. This variability suggests that the claim made — that midlatitudes experienced the highest probability of bleaching — does not hold uniformly during the 2014–2017 bleaching event. Several factors could explain this inconsistency, which are explored below.

2.5.2 Statistical significance

There are clearly differences in bleaching between across the latitude zones, but are these differences statistically significant? I.e: did they occur beyond pure chance? To ascertain this, we could perform an ANOVA test. This would compare the average bleaching across each zone to determine whether the differences are significant.

2.5.3 Temporal correlation

In addition, treating and analysing each year of bleaching as if they are independent from the others may be incorrect. To investigate this, a Durbin Watson test (Kenton 2024) was performed on a linear model of average bleaching against year. The Durbin-Watson (DW) statistic obtained was 0.003, indicating a strong positive autocorrelation. This suggests bleaching values in one year are highly correlated with values in adjacent years. Therefore, treating each year’s bleaching data as independent observations may lead to incorrect conclusions.

Code
dwtest(lm(Average_bleaching ~ Year, data = p1data))

    Durbin-Watson test

data:  lm(Average_bleaching ~ Year, data = p1data)
DW = 0.0033551, p-value < 2.2e-16
alternative hypothesis: true autocorrelation is greater than 0

2.5.4 Sample distribution

In addition, not all latitude zones were equally sampled in the dataset. The tropical lowlatitude, highlatitude, and subtropical zones were particularly under-sampled compared to others, possibly due to the lower prevalence of reefs in those regions. This uneven distribution of observations could act as a confounding factor, which limits the accuracy of comparisons across zones. That is, if one zone has significantly more measurements than another, we cannot be confident that observed differences in bleaching are wholly due to the zone itself, instead of differences in sample size.

Code
plot1 <- ggplot(p1data, aes(x = Latitude_Zone)) +
  geom_bar(fill = "darkgray") +
  labs(title = "Observations per Latitude Zone",
       x = "Latitude Zone", y = "Count") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

plot2 <- ggplot(p1data, aes(x = Year)) +
  geom_bar(fill = "darkgreen") +
  labs(title ="Observations per year",
       x = "Year", y = "Count") +
  theme_minimal() 

plot1  + plot2 + plot_layout(guides = 'collect', ncol = 2, nrow = 1) +
  plot_annotation(tag_levels = 'A') &
  theme(legend.position='top')

3 Part 2

3.1 Introduction

This section aims to investigate which environmental variables (salt, ph, etc.) are more accurate at predicting coral bleaching: present variables, or those measured four years prior.

3.2 Formulating problem

Research question: How do the accuracy of bleaching predictions from current vs four year old variables compare?

Hypothesis: The accuracy of bleaching predictions from current variables will be higher than four year old variables

3.3 Exploratory Data Analysis

3.3.1 Extracting NCDF data

The explanatory variables of interest (temperature, salt, pH, nitrogen and algae), are contained within the eReefs dataset. This is a distinct and separate dataset to the bleaching dataset from part 1. This data is stored as a netCDF (nCDF) file, a common format for multidimensional scientific data and variables (DATA3888 2025). We can extract the variables from the nCDF (DATA3888 2025), and store it as the standard R dataframe, for ease of use.

Code
eReefs_nc = ncdf4::nc_open("data/eReefs_assessment.nc")

# Longitude and Latitude
lat = ncdf4::ncvar_get(eReefs_nc, "latitude")
long = ncdf4::ncvar_get(eReefs_nc, "longitude")

# Time
time = ncdf4::ncvar_get(eReefs_nc, "time")
tunits = ncdf4::ncatt_get(eReefs_nc, "time", "units")
cf = CFtime::CFtime(tunits$value, calendar = "standard", time)  # convert time to CFtime class
timestamps = CFtime::as_timestamp(cf)  # get character-string times
timestamps = as.Date(timestamps, format = "%Y-%m-%d")  # convert string to date object instead of the epoch format

# Explanatory variables
temp = ncdf4::ncvar_get(eReefs_nc, "temp")
salt = ncdf4::ncvar_get(eReefs_nc, "salt")
nitrogen = ncdf4::ncvar_get(eReefs_nc, "TOTAL_NITROGEN")
algae = ncdf4::ncvar_get(eReefs_nc, "MA_N")
ph = ncdf4::ncvar_get(eReefs_nc, "PH")

# Convert data to data.frame
eReefs_df = expand.grid(long = long, lat = lat, time = timestamps) %>%
    mutate(temp = as.vector(temp), salt = as.vector(salt), nitrogen = as.vector(nitrogen),
           algae = as.vector(algae), ph = as.vector(ph))

eReefs_noNan = eReefs_df %>% drop_na()

3.3.2 Joining the dataframes

Now that the variables are extracted into a dataframe, the eReefs data is now stored in the same format as the bleaching data from part 1. For ease of manipulation and simplicity, we can join the eReefs and bleaching datasets, into a single joined dataframe, that contains both explanatory variables, and their matching bleaching data. This way, we can use a single dataframe to predict bleaching data from explanatory variables.

To achieve this, rows in eReefs and the bleaching dataset of the same year are matched (DATA3888 2025), so long as the reefs the two rows correspond to are within a distance of 1000 units. For the variables from four years ago, this process is repeated, with rows matched where the variables were measured four years before the bleaching data.

Code
# #| cache: TRUE
 
# eReefs_noNan = eReefs_df %>% drop_na()
# 
# bleaching_sf = st_as_sf(bleachingData, coords = c("longitude", "latitude"), crs = 4326)
# eReefs_sf = st_as_sf(eReefs_noNan, coords = c("long", "lat"), crs = 4326)
# 
# eReefs_sf <- eReefs_sf %>%
#   mutate(year_eReef = year(time))
# 
# bleaching_sf <- bleaching_sf %>%
#   mutate(year_bleach = year(surveyDate))
# 
# joined_sf = st_join(bleaching_sf, eReefs_sf, join = st_is_within_distance, dist = 1000, left = TRUE)
# 
# joined_df = joined_sf |>
#    as.data.frame() |> drop_na() |>
#    filter(year_eReef == year_bleach)
# 
# joined_df_4yrago <- joined_sf %>% filter(year_bleach - year_eReef == 4) %>%  as.data.frame()

# saveRDS(joined_df, "joined_df")
# saveRDS(joined_df_4yrago, "joined_df_4yrago")

extract_coords <- function(data) {
  coords = st_coordinates(data$geometry)
  long_values = coords[, 1]
  lat_values = coords[, 2]
  
  data = data |>
  mutate(longitude = long_values,
         latitude = lat_values)
}

joined_df <- readRDS("joined_df")
joined_df_4yrago <- readRDS("joined_df_4yrago")

joined_df <- extract_coords(joined_df)
joined_df_4yrago <- extract_coords(joined_df_4yrago)

3.4 Executing solution

3.4.1 Principle Component Analysis

Using explanatory variables (salt, ph, etc.) to predict a category (bleaching), can be achieved with machine learning (ML) models. These models are trained to identify patterns between the variables and the class, such that given a collection of variables for a particular row/instance, the model can predict which class the instance belongs to. Specifically, given values for salt, ph, algae, temperature and nitrogen, the model predicts which category of bleaching a reef of these measurements would experience.

One way to decrease model complexity, and improve performance, is to perform principal component analysis (PCA). PCA finds multiple linear combinations of the variables, in order to find the combination(s) that explain highest amount of variation in the dataset (i.e: are the best at accurately predicting the class). This can help reduce the number of variables we use in our models, hence lowering computational cost. For example, if we have a dataset with 20 variables, and the first 3 principal components explain 90% of the total variance, we may be able to train our model using just those 3 components instead of all 20, vastly increasing efficiency.

Performing PCA on the environmental variables, we find that the first three principal components account for 94% of the variance in the contemporary dataset, and 90% of the variance in the dataset from four years prior. We can then select these three principal components to train models with.

Code
set.seed(3888)

perf_pca <- function(data, msg){
  pca_model <- prcomp(data[, c("temp", "nitrogen", "salt", "algae", "ph")], scale. = TRUE)
  
  tab <- knitr::kable(as.data.frame(summary(pca_model)$importance ), caption = paste("PCA Component Importance", msg))
  print(tab)
  
  pca_data <- as.data.frame(pca_model$x[, 1:3])
  colnames(pca_data) <- c("PC1", "PC2", "PC3")
  pca_data$bleachCat <- data$bleachCat
  return(pca_data)
}

pca_res <- perf_pca(joined_df, "(contempoary)")
PCA Component Importance (contempoary)
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.645395 1.033238 0.9703639 0.5257562 0.0840788
Proportion of Variance 0.541460 0.213520 0.1883200 0.0552800 0.0014100
Cumulative Proportion 0.541460 0.754980 0.9433000 0.9985900 1.0000000
Code
pca_res_4y_ago <- perf_pca(joined_df_4yrago, "(4y ago)")
PCA Component Importance (4y ago)
PC1 PC2 PC3 PC4 PC5
Standard deviation 1.587505 1.124066 0.84995 0.6637116 0.2310318
Proportion of Variance 0.504030 0.252700 0.14448 0.0881000 0.0106800
Cumulative Proportion 0.504030 0.756740 0.90122 0.9893200 1.0000000

3.4.2 Model deployment

We deploy three ML models: k-nearest neighbor (kNN), a support vector machine (SVM), and a random forest (RF). kNN determines the class of a data point by looking at the majority class of it’s k nearest neighbors. Here, “nearness” is quantified using a measurement of distance, where data points with similar explanatory variables having a low distance between each other.

SVM classifies data by finding the optimal decision boundary (represented as a line, or hyperplane), that maximizes the distance between each class. It’s goal is to separate as many classes as possible, making it easier to identify each one. Decision trees (DTs) build a hierarchical structure of decisions, that classify an example based on it’s variables. Random forests generate an ensemble of multiple DTs to improve prediction accuracy.

To improve accuracy, models will be trained and evaluated using 5-fold cross validation (CV). 5-fold CV randomly splits the dataset into 5 subsets (folds), whereby the model is trained on 4 folds, and evaluated on the 1 remaining fold. This process is repeated 5 times, using a different fold for evaluation each time.

Code
set.seed(3888)

cv <- function(data, n_sim = 50, cvK = 5, k_knn = 5) {
  cv_50acc5_knn = cv_50acc5_svm = cv_50acc5_rf = c()

  X = data[, c("PC1", "PC2", "PC3")]
  y = data$bleachCat  
  
  for (i in 1:n_sim) {
    cvSets = cvFolds(nrow(X), cvK)  
    cv_acc_knn = cv_acc_svm = cv_acc_rf = c()

    for (j in 1:cvK) {
      test_id = cvSets$subsets[cvSets$which == j] 
      X_test = X[test_id, ]
      X_train = X[-test_id, ]
      y_test = y[test_id]
      y_train = y[-test_id]

      knn_pred <- knn(train = X_train, test = X_test, cl = y_train, k = k_knn)
      cv_acc_knn[j] = mean(knn_pred == y_test)  
      
      svm_res <- svm(x = X_train, y = as.factor(y_train))  
      svm_pred <- predict(svm_res, X_test)
      cv_acc_svm[j] = mean(svm_pred == y_test)  

      rf_res <- randomForest(x = X_train, y = as.factor(y_train))
      rf_pred <- predict(rf_res, X_test)
      cv_acc_rf[j] = mean(rf_pred == y_test)  
    }

    cv_50acc5_knn <- append(cv_50acc5_knn, mean(cv_acc_knn))
    cv_50acc5_svm <- append(cv_50acc5_svm, mean(cv_acc_svm))
    cv_50acc5_rf <- append(cv_50acc5_rf, mean(cv_acc_rf))
  }

  return(list(
    knn_accuracy = mean(cv_50acc5_knn),
    svm_accuracy = mean(cv_50acc5_svm),
    rf_accuracy = mean(cv_50acc5_rf)
  ))
}

3.5 Evaluation and improvements

3.5.1 Comparing against hypothesis

Upon training and evaluation the models using CV, the following accuracy results were obtained. Model accuracy is defined as the percentage of correct class predictions, i.e: the proportion of predictions where a reef’s predicted bleaching matches it’s actual measured bleaching.

Across both datasets, current and four years prior, RF was the most accurate model, followed by kNN and then SVM. Models built on the contempoary dataset, had a slightly higher average prediction accuracy (45%) than the four years prior dataset (43%). These findings are consistent with the hypothesis.

Code
set.seed(3888)

cv_results <- cv(pca_res)
cv_results_4y_ago <- cv(pca_res_4y_ago) 

cv_results_to_df <- function(data){
  cv_df <- data.frame(
  Model = c("KNN", "SVM", "Random Forest"),
  Accuracy = c(data$knn_accuracy*100, data$svm_accuracy*100, data$rf_accuracy*100))
}

cv_df <- cv_results_to_df(cv_results)
cv_df_4y_ago <- cv_results_to_df(cv_results_4y_ago)
plot1 <- ggplot(cv_df, aes(x = Model, y = Accuracy, fill = Model)) +
  geom_col() + geom_text(aes(label = round(Accuracy, 2)), vjust = -0.5) +
  labs(title = "Model Accuracy (contempoary)", x = "Model", y = "Accuracy (%)") +
  scale_y_continuous(limits = c(0, 60)) +
  theme_minimal() + scale_fill_brewer(palette = "Set1") 

plot2 <- ggplot(cv_df_4y_ago, aes(x = Model, y = Accuracy, fill = Model)) +
  geom_col() + geom_text(aes(label = round(Accuracy, 2)), vjust = -0.5) +
  labs(title = "Model Accuracy (4Y Ago)", x = "Model", y = "Accuracy (%)") +
  scale_y_continuous(limits = c(0, 60)) +
  theme_minimal() + scale_fill_brewer(palette = "Set1") 

plot1  + plot2 + plot_layout(guides = 'collect', ncol = 2, nrow = 1) +
  plot_annotation(tag_levels = 'A') & theme(legend.position='top')

3.5.2 Generally low accuracy

Accuracy across all models in both datasets was low (44%). The bleaching event investigated (2015-17), was unprecedented and highly unpredictable, which may explain the low prediction accuracy. In addition, the spatial distance used to join the eReefs and bleaching datasets may have been too high, potentially matching environmental data that does not accurately reflect the conditions at a specific bleaching site. To mitigate this, the spatial distance could be reduced, potentially improving accuracy.

3.5.3 Contempoary vs prior data

As discovered above, models built on the contemporary dataset, had a higher prediction accuracy than the four years prior dataset. Perhaps because coral bleaches in real-time, largely based off of the environmental factors of the present, not four years ago.

3.5.4 Accuracy variation

All three models vary in accuracy. This may be due to the nature of each model. RF was consistently the most accurate model, perhaps due to the fact it is an amalgamation of individual decision trees, each of which is a model itself. This is not the case for kNN and SVM, which are simply individual models.

3.5.5 Accuracy as an incomplete metric

Furthermore, accuracy might not be the best metric to evaluate model performance. The class that is predicted, the bleaching category (bleachCat) is an ordinal variable ranging from 0 to 5. Therefore, if a reefs bleaching category is 5, predicting a 1 is worse than predicting a 3, since 3 is closer to 5. However, accuracy alone, a binary comparison of whether a prediction is equal to the actual value, cannot capture such nuance. Hence, accuracy may be an incomplete view of the model’s true performance, due to the nature of the data. To improve this, models and/or evaluation methods built for ordinal data, may be a more useful avenue to explore.

4 References

Andreas Alfons. 2025. cvTools: Cross-Validation Tools for Regression Models. https://cran.r-project.org/web/packages/cvTools/index.html.
Brian Ripley. 2025. Package “Class”. https://cran.r-project.org/web/packages/class/class.pdf.
DATA3888. 2025. “DATA3888_2025 Lab Week 1.” DATA3888. https://canvas.sydney.edu.au/courses/65141/pages/week-1-assumed-data-science-concepts-and-reef?module_item_id=2635598.
David Meyer. 2025. E1071: Misc Functions of the Department of Statistics, Probability Theory Group. https://cran.r-project.org/web/packages/e1071/index.html.
Edzer Pebesma. 2025. Simple Features for r. https://r-spatial.github.io/sf/.
Joe Cheng. 2025a. Crosstalk: Inter-Widget Interactivity for HTML Widgets. https://cran.r-project.org/web/packages/crosstalk/index.html.
———. 2025b. Htmltools: Tools for HTML. https://cran.r-project.org/web/packages/htmltools/index.html.
Kenton, Will. 2024. “Durbin Watson Test: What It Is in Statistics, with Examples.” Investopedia. https://content.meteoblue.com/en/research-education/educational-resources/meteoscool/general-climate-zones.
Leo Breiman. 2025. randomForest: Breiman and Cutlers Random Forests for Classification and Regression. https://cran.r-project.org/web/packages/randomForest/index.html.
Meteoblue. n.d. “General Climate Zones.” Meteoblue. https://content.meteoblue.com/en/research-education/educational-resources/meteoscool/general-climate-zones.
plotly. 2025. Plotly r Open Source Graphing Library. https://plotly.com/r/.
plotly Graphing Libraries. 2025. Https://Plotly.com/r/Scatter-Plots-on-Maps/. https://plotly.com/r/scatter-plots-on-maps/.
Quatro. 2025. Quatro: An Open-Source Scientific and Technical Publishing System. https://quarto.org/.
Richard A. Becker. 2025. Maps: Draw Geographical Maps. https://cran.r-project.org/web/packages/maps/index.html.
S. Sully, M. K. Donovan, D. E. Burkepile. 2019. “A Global Analysis of Coral Bleaching over the Past Two Decades.” Nature Communications. https://www.nature.com/articles/s41467-019-09238-2#Sec7.
The R Foundation. 2025. The r Project for Statistical Computing. The R Foundation. https://www.R-project.org/.
Thomas Lin Pedersen. 2025. Patchwork. https://patchwork.data-imaginist.com/.
Tidyverse. 2025. Tidyverse. https://www.tidyverse.org/.
Torsten Hothorn. 2025. Lmtest: Testing Linear Regression Models. https://cran.r-project.org/web/packages/lmtest/index.html.